Goto

Collaborating Authors

 theorem 9






A Reduction to no Memory Proofs

Neural Information Processing Systems

We first need the following lemma, which bounds the prediction shifts and magnitudes of Algorithm 2. See proof in Appendix A.2. We are now ready to prove Theorem 9. Proof of Theorem 9. We show that Algorithm 2 achieves the desired regret bound. Lipschitz) where the last transition used the Lipschitz assumption to bound the gradient. This concludes the second part of the lemma. We give a general example of a BCO algorithm that may be employed in conjunction with our reduction procedure given in Algorithm 2. For a positive semi-definite matrix Moreover, for all null null we have that 1. if null The proof of Lemma 15 relies on a few standard results.



Constructive Approximation under Carleman's Condition, with Applications to Smoothed Analysis

Koehler, Frederic, Wu, Beining

arXiv.org Machine Learning

A classical result of Carleman, based on the theory of quasianalytic functions, shows that polynomials are dense in $L^2(μ)$ for any $μ$ such that the moments $\int x^k dμ$ do not grow too rapidly as $k \to \infty$. In this work, we develop a fairly tight quantitative analogue of the underlying Denjoy-Carleman theorem via complex analysis, and show that this allows for nonasymptotic control of the rate of approximation by polynomials for any smooth function with polynomial growth at infinity. In many cases, this allows us to establish $L^2$ approximation-theoretic results for functions over general classes of distributions (e.g., multivariate sub-Gaussian or sub-exponential distributions) which were previously known only in special cases. As one application, we show that the Paley--Wiener class of functions bandlimited to $[-Ω,Ω]$ admits superexponential rates of approximation over all strictly sub-exponential distributions, which leads to a new characterization of the class. As another application, we solve an open problem recently posed by Chandrasekaran, Klivans, Kontonis, Meka and Stavropoulos on the smoothed analysis of learning, and also obtain quantitative improvements to their main results and applications.


Graph Clustering: Block-models and model free results

Yali Wan, Marina Meila

Neural Information Processing Systems

Clustering graphs under the Stochastic Block Model (SBM) and extensions are well studied. Guarantees of correctness exist under the assumption that the data is sampled from a model. In this paper, we propose a framework, in which we obtain "correctness" guarantees without assuming the data comes from a model. The guarantees we obtain depend instead on the statistics of the data that can be checked. We also show that this framework ties in with the existing model-based framework, and that we can exploit results in model-based recovery, as well as strengthen the results existing in that area of research.



A Algorithms

Neural Information Processing Systems

" j for k: " 2 to n do x The result follows directly from Theorem 1 in Cranko et al. [2021]: sup Lemma 7. If Assumption 3 holds, for any " 1 is an eigenvector of H Similarly, applying Hoeffding's inequality and the Kantorovich-Rubinstein theorem gives us Probp E Theorem 9. Given a Bayesian network We prove the statements in this theorem in several steps. In order to prove (a) and (b), we will show that the DRO problem is strictly convex if true non-neighbors are known so that there is an optimal solution. We would like to show that the solution to Equation (4) with true non-neighbor constraints is optimal. In this way, we do not recover any non-neighbor nodes in the skeleton. We follow the proof of Lemma 11.2 in Hastie et al. [2015]. Until now, we have proven properties (a) and (b). In this way, we are able to recover all the neighbor nodes with a threshold β {2 . Now we are ready to prove (d). BIC is not applicable to skeletons. The best and runner-up results are marked in bold. Significant differences are marked by: (paired t-test, p ă 0. 05). The final sample complexity becomes m " O p C p ε